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Abstract 


The complete genome sequence of the first equine coronavirus (ECoV) isolate, NC99 strain was accomplished by directly sequencing 11 
overlapping fragments which were RT-PCR amplified from viral RNA. The ECoV genome is 30,992 nucleotides in length, excluding the polyA tail. 
Analysis of the sequence identified 11 open reading frames which encode two replicase polyproteins, five structural proteins (hemagglutinin esterase, 
spike, envelope, membrane, and nucleocapsid) and four accessory proteins (NS2, p4.7, p12.7, and I). The two replicase polyproteins are predicted to 
be proteolytically processed by three virus-encoded proteases into 16 non-structural proteins (nsp1—16). The ECoV nsp3 protein had considerable 
amino acid deletions and insertions compared to the nsp3 proteins of bovine coronavirus, human coronavirus OC43, and porcine hemagglutinating 
encephalomyelitis virus, three group 2 coronaviruses phylogenetically most closely related to ECoV. The structure of subgenomic mRNAs was 
analyzed by Northern blot analysis and sequencing of the leader—body junction in each sg mRNA. 
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Introduction 


Coronaviruses are mainly associated with respiratory and 
gastrointestinal disease in humans (Drosten et al., 2003; Holmes, 
2001; Ksiazek et al., 2003; Peiris et al., 2003; van der Hoek et al., 
2004; Woo et al., 2005) and respiratory, enteric, neurological, or 
hepatic disease in animals (Holmes, 2001). Coronaviruses have 
also been isolated from bats, poultry and other birds (Cavanagh, 
2005; Chu et al., 2006; Poon et al., 2005; Ren et al., 2006). On 
the basis of antigenic and genetic analyses, coronaviruses are 
divided into three groups (Gonzalez et al., 2003; Gorbalenya et 
al., 2004; Snijder et al., 2003). Group | viruses include human 
coronaviruses 229E (HCoV-229E) and NL63 (HCoV-NL63), 
canine coronavirus (CCoV), feline coronavirus (FCoV), porcine 
transmissible gastroenteritis virus (TGEV), porcine epidemic 
diarrhea virus (PEDV), and bat coronavirus. Group 2 viruses are 
subdivided into group 2a which includes murine hepatitis virus 
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(MHV), human coronaviruses OC43 (HCoV-OC43) and HKU1 
(HCoV-HKU1), bovine coronavirus (BCoV), porcine hemag- 
glutinating encephalomyelitis virus (PHEV), and rat coronavirus 
(RCov), and group 2b which includes SARS-coronavirus 
(SARS-CoV). Group 3 viruses include avian viruses, such as 
avian infectious bronchitis virus (IBV), and turkey coronavirus 
(TCoV). 

Members of the family Coronaviridae are enveloped, 
positive-stranded RNA viruses with exceptionally large, poly- 
cistronic genomes (27—32 kb). The 5’-proximal two-thirds of the 
genome comprises two open reading frames (ORFs), ORF la and 
ORF 1b, which encode the replicase polyproteins (pp) la and 
pplab (Ziebuhr, 2005). Expression of the pplab requires a — 1 
ribosomal frameshift during translation of the genomic RNA 
(Brierley et al., 1987). The two replicase polyproteins are pro- 
cessed extensively by two or three viral proteases encoded by 
ORF la to generate up to 16 end-products termed nonstructural 
proteins (nsp) 1 to 16 and multiple processing intermediates 
(Ziebuhr, 2005; Ziebuhr et al., 2000). The N-proximal region of 
the polyproteins is processed by one or two papain-like proteases 
(PL?”°), whereas the central and C-proximal region is processed 
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by the viral main protease, 3C-like protease (3CL”°) (Ziebuhr, 
2005; Ziebuhr et al., 2000). The 3’-proximal one-third of the 
genome encodes structural proteins and various accessory 
proteins. Genes encoding the four structural proteins present in 
all coronaviruses occur in the 5’ to 3’ order as spike (S), envelope 
(E), membrane (M), and nucleocapsid (N) proteins (Brian and 
Baric, 2005; Lai et al., 2006). Some coronaviruses contain an 
additional structural protein, the hemagglutinin—esterase (HE) 
protein which is located upstream of the S protein gene (Lai et 
al., 2006). In contrast to the replicase proteins which are directly 
translated from the genomic RNA, coronavirus structural and 
accessory proteins are expressed from a nested set of 3’ co- 
terminal subgenomic (sg) mRNAs that also possess a common 
5’ leader sequence derived from the 5’ end of the genome 
(Pasternak et al., 2006; Sawicki et al., 2007). The common 5’ 
leader is fused to the 3’ body segments through a mechanism that 
is presumed to involve discontinuous minus strand RNA 
synthesis to produce subgenome-length templates for subge- 
nomic mRNA synthesis, with the transcription regulatory se- 
quence (TRS) elements determining the fusion sites of leader 
and body segments (see recent review of Pasternak et al., 2006; 
Sawicki et al., 2007 for details). 

Equine coronavirus (ECoV) was first isolated from feces of a 
diarrheic foal in 1999 (ECoV-NC99) in North Carolina, USA 
(Guy et al., 2000). Little is known about ECoV and its clinical 
significance. Molecular characterization of ECoV and develop- 
ment of diagnostic and prophylactic reagents necessitate 
sequencing of ECoV. In this study, we determined the full- 
length nucleotide sequence of the ECoV-NC99 strain of equine 
coronavirus. The viral genome and proteome were analyzed and 
the predicted features of ECoV nonstructural, structural, and 
accessory proteins were compared to those of other corona- 
viruses. Synthesis of sg mRNAs in ECoV-infected cells was 
analyzed by Northern blotting. The leader—body junction 
sequence in each sg mRNA was determined and the exact 
position of TRS used for synthesis of each sg mRNA was 
mapped on the genome. The evolutionary relationship between 
ECoV and other phylogenetically closely related group 2a 
coronaviruses was explored. 


Results and discussion 
ECoV genome sequence analysis 


We report here the full-length genomic sequence of the first 
ECoV isolate, the NC99 strain, and this is also the first reported 
complete genome sequence of ECoV. The nucleotide sequence 
was determined by directly sequencing 11 overlapping cDNA 
fragments which were RT-PCR amplified from viral RNA. The 
ECoV-NC99 genome comprises 30,992 nucleotides (nt), 
excluding the 3’ poly (A) tail, and has a GC content of 37.2%. 
The nucleotide sequence data have been deposited in GenBank 
under accession number EF446615. 

Both 5’ and 3’ ends of the ECoV genome contain short 
untranslated regions (UTR). The 5’ UTR comprises 209 nt (1— 
209) and includes a potential short internal ORF of 8 codons (nt 
99-125). Four stem—loop structures (I, II, HI, and IV) were 


identified in the 5’ UTR anda short stretch of nucleotides that are 
part of the ORFla (see Supplementary Fig. S1). The bulged 
stem-loop III (96-115) and IV (189-208) closely resemble the 
stem-loop III and IV that have been identified as replication 
signaling elements in bovine coronavirus and other group 2 
coronaviruses (Raman and Brian, 2005; Raman et al., 2003; Wu 
et al., 2003). The 3’ UTR of the ECoV genome comprises 289 nt 
(30,704—30,992) and contains a putative bulged stem—loop 
structure (nt 30,703—30,770) and a putative pseudoknot struc- 
ture (30,766—30,819) (see Supplementary Fig. S2). Similar 
putative bulged stem-loop structure and pseudoknot structure 
have been identified in murine hepatitis virus and other group 2 
coronaviruses; these have been shown to be essential for viral 
replication (Goebel et al., 2004a,b; Hsue and Masters, 1997; 
Hsue et al., 2000; Williams et al., 1999). 

Analysis of the ECoV-NC99 genome reveals 11 potential 
ORFs (la, 1b, 2—8, 9a and 9b) as shown in Fig. | and Table 1. 
The ORFs la and 1b encode the replicase polyproteins ppla 
and pplab. The ORFs 2-8, 9a and 9b encode structural and 
accessory proteins NS2, HE, S, p4.7, p12.7, E, M, N, and I, 
respectively. 

The replicase ORF la (nt 210—13,499) and replicase ORF1b 
(13,478-21,595) occupy 21.4 kb (69%) of the ECoV-NC99 
genome. The translation of ORF la generates a precursor ppla of 
4,429 amino acids. Similar to other coronaviruses, translation of 
ORF |b involves a —1 ribosomal frameshift, generating a 7128- 
amino acid pplab. The ribosomal frameshift is assumed to be 
directed by two signals in the ORF la/1b overlapping region: a 
slippery sequence 5'UUUAAAC3’ (nt 13,472—13,478) and a 
predicted downstream RNA pseudoknot structure (nt 13,484— 
13,559) (see Supplementary Fig. S3). The ppla and pplab 
proteins are predicted to be proteolytically processed by viral- 
encoded proteases into 16 non-structural proteins (nsp1—16, 
Table 2) required for viral replication and transcription. By 
comparison to other coronaviruses, a number of putative 
functional domains are predicted in the ECoV ppla and pplab 
and these are summarized in Fig. | and Table 2 (Gorbalenya et 
al., 1991, 2006; Snijder et al., 2003; Ziebuhr, 2005; Ziebuhr et 
al., 2001). Enzymatic activities of nsp3, nsp5, nsp12, nsp13, 
nsp14 and nsp15 have been experimentally confirmed for some 
coronaviruses (Barretto et al., 2005; Cheng et al., 2005; Guarino 
et al., 2005; Heusipp et al., 1997; Ivanov et al., 2004a,b; Ivanov 
and Ziebuhr, 2004; Lindner et al., 2005; Minskaia et al., 2006; 
Putics et al., 2005, 2006; Seybert et al., 2000, 2005; Tanner et al., 
2003; Ziebuhr, 2005; Ziebuhr et al., 2001). The 3CL?”° (catalytic 
residues His-3333 and Cys-3437) is predicted to cleave the 
C-terminal half of the ECoV ppla and the ORF1b-encoded part 
of pplab. The putative PL1"™ (catalytic residues Cys-1078 and 
His-1229) and PL2?"° (catalytic residues Cys-1675 and His- 
1832) are predicted to process the N-proximal regions of the 
ECoV ppla (Fig. | and Table 2). 

The most striking differences between the ECoV replicase 
and other group 2 coronaviruses replicases were identified in 
nsp3. The ECoV nsp3 protein has 3 aa deletions and 55 aa 
insertions compared to the nsp3 proteins of BCoV, HCoV-OC43, 
and PHEV, three viruses phylogenetically most closely related to 
ECoV. These insertions and deletions are clustered at two 
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Fig. 1. Schematic diagrams of ECoV genome organization. The ECoV entire genome organization is depicted (middle). The 5’ leader, ORFs la and 1b encoding 
replicase polyproteins are shown, with the ribosomal frameshift site indicated. Structural and accessory proteins are also indicated: NS2 protein (encoded by ORF2), 
hemagglutinin esterase (HE, ORF3), spike protein (S, ORF4), p4.7 protein (ORF5), p12.7 protein (ORF6), envelope protein (E, ORF7), membrane protein (M, ORF8), 
nucleocapsid protein (N, ORF9a), and I protein (ORF9b). Predicted cleavage products (nsp!—nsp16) of the replicase polyproteins are depicted (Bottom). Arrows 
represent sites in the corresponding replicase polyproteins that are cleaved by papain-like proteases (white arrows) or the 3C-like cysteine protease (black arrows). A 
number of putative functional domains predicted in the ECoV ppla and pplab are indicated. PL1, papain-like proteinase | (aa 1059-1275); PL2, papain-like 
proteinase 2 (aa 1570-1867); X, X-domain which contains adenosine diphosphate-ribose 1”-phosphatase (ADRP) (aa 1276-1435); TM, transmembrane domain; 
3CL, 3C-like proteinase; RdRp, RNA-dependent RNA polymerase; Z, zinc-binding domain; HEL, helicase domain; ExoN, exonuclease; N, nidoviral uridylate- 
specific endoribonuclease (NendoU); MT, 2’-O-ribose methyltransferase (2’-O-MT). Domains Ac (aa 846-1058) and Y (aa 2310-2796) are described by Ziebuhr et 
al. (2001). The spike protein (1363 amino acids) of ECoV is represented by a black line (Top). The N-terminal signal peptide (amino acid residues 1—14 or 17), the 
heptad repeat | (HR1, amino acid residues 991—1902), the heptad repeat 2 (HR2, amino acid residues 1259—1304), the transmembrane domain (amino acid residues 
1308-1330), and the cytoplasmic domain (amino acid residues 1331-1363) are depicted. A potential cleavage recognition sequence (RRQRR) at residues 764—768 
and the predicted cleavage site between residues 768 and 769 are indicated. The generated cleavage products S1 and S2 subunits are depicted. The positions of the 
receptor-binding domain on the S1 subunit and the fusion peptide on the S2 subunit are currently unknown. 


regions: the Ac domain and the region between the PL2?*° and insertions and deletions are not located in the functional domains 
the Y domain. The functional significance of these insertions and of these enzymes (Fig. 1). 

deletions is unknown as yet; however, the functions of PL1""°, ORF2 (nt 21,610—22,446) of ECoV-NC99 encodes the 
PL2?, and ADRP are not anticipated to be affected since predicted NS2 protein with 278 amino acids. The NS2 of 


Table 1 

Coding potential of the ECoV-NC99 genome sequence 

ORF Encoded Nucleotide position No. of No. of amino mRNA used 
protein in the genome nucleotides acids (aa) for expression 

5’ Leader 1-64 64 

5’ UTR 1-209 209 

ORF la ppla 210-13,499 13,290 4429 1 

ORF La/b pplab 210-21,595 21,386 7128 1 

ORF2 NS2 21,610—22,446 837 278 2 

ORF3 HE 22,458—23,729 1272 423 3 

ORF4 S 23,744—27,835 4092 1363 4 

ORF5 p4.7 27,825—27,947 123 40 5 

ORF6 pl2.7 28,076—28,405 330 109 6 

ORF7 E 28,392—28,646 255 84 7 

ORF8 M 28,661—29,353 693 230 8 

ORF9a N 29,363—30,703 1341 446 9 

ORF9b I 29,424—30,044 621 206 9 

3’ UTR 30,704—30,992 289 


* The mRNA used for expression of each protein is derived from the Northern blotting analysis and the comparison with other group 2a coronaviruses. See the text 
for details. 
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Table 2 


Predicted end-products of proteolytic processing of the ECoV replicase polyproteins ppla and pplab 


Cleavage Nucleotide Polyprotein Position in Length Putative funcitional Putative proteases predicted 
product position* ppla/pp lab (aa) (aa) domain(s)” to release protein from 
polyproteins 

nsp| 210-941 ppla/pplab 1Met-Gly244 244 PELE? 

nsp2 942-2744 ppla/pplab 245Val-Ala845 601 PEI 

nsp3 2745-8597 ppla/pplab 846Gly-Gly2796 1951 Ac, PL1?", ADRP, PL2?”, PL2P° 
TM1, Y 

nsp4 8598—10,085 ppla/pplab 2797Ala-G1n3292 496 TM2 PL2P"°+3CLPr° 

nsp5 10,086-10,994 ppla/pplab 3293Ser-GIn3595 303 3cLP” 3cLP° 

nsp6 10,995-11,855 ppla/pplab 3596Ser-GIn3882 287 TM3 3cLP° 

nsp7 11,856—12,122 ppla/pplab 3883Ser-G1n3971 89 Part of RNA binding 3CLF 
hexadecameric supercomplex 

nsp8 12,123—12,713 ppla/pplab 3972Ala-Gln4168 197 Part of RNA binding 3CLP? 
hexadecameric supercomplex 

nsp9 12,714—13,043 ppla/pplab 4169Asn-Gln4278 110 ssRNA-binding protein 3CLP® 

nsp10 13,044—13,454 pp1la/pplab 4279Ala-Gln4415 137 2 zinc fingers 3CLP= 

nsp11 13,455—13,496 ppla 4416Ser-Ser4429 14 3CL’” 

nsp12 13,455—16,237 pplab 4416Ser-G1n5343 928 RdRp 3CLP" 

nsp13 16,238-18,034 pplab 5344Ser-G1n5942 599 ZBD, HEL 3CcLP° 

nsp14 18,035—19,597 pplab 5943Cys-G1n6463 521 Exonuclease (ExoN) 3CL’° 

nsp15 19,598—20,695 pplab 6464Ser-G1n6829 366 NendoU 3CLP™ 

nsp1l6 20,696—21,592 pplab 6830Ala-Ile7 128 299 2'-O-MT 3CLFY 


Domains Ac and Y are described by Ziebuhr et al. (2001). 


* Nucleotide position means the location of the nucleotides encoding corresponding proteins in the entire genome of equine coronavirus-NC99 strain. 

> PL1?™, papain-like proteinase 1; PL2?", papain-like proteinase 2; ADRP, adenosine diphosphate-ribose 1”-phosphatase (formerly known as ‘X-domain’); 
3CL’™, 3C-like proteinase; TM, transmembrane domain; GFL, growth factor-like domain; RdRp, RNA-dependent RNA polymerase; ZBD, zinc-binding domain; 
HEL, helicase domain; NendoU, nidoviral uridylate-specific endoribonuclease; 2’-O-MT, 2’-O-ribose methyltransferase. 


ECoV has 67%, 67%, and 45% amino acid identity with the 
respective NS2 proteins of BCoV, HCoV-OC43, and PHEV. The 
lower amino acid identity with PHEV may be attributable to the 
fact that PHEV has a truncated NS2 protein (Vijgen et al., 2006). 
Sequence analysis revealed that the ECoV NS2 protein contains 
a domain (aa 46-135) with similarity to the putative cyclic 
phosphodiesterase (CPD, Martzen et al., 1999). The CPD 
domain has also been identified in the NS2 proteins of other 
group 2a coronaviruses as well as in the 3’end of the pp1a protein 
of toroviruses (Gorbalenya et al., 2006; Snijder et al., 1991, 
2003). The NS2 of ECoV was predicted to contain 9 potential 
phosphorylation sites. The NS2 of ECoV does not contain a 
signal peptide and is a non-secretory protein. The function of the 
NS2 protein in coronaviruses has not been studied in detail. It is 
known that the NS2 gene is non-essential for MHV replication in 
transformed cells (Schwarz et al., 1990). However, a recent 
study showed that a point mutation in the NS2 of MHV led to its 
attenuation in mice in spite of its wild-type replication in tissue 
culture (Sperry et al., 2005). 

ORF3 (nt 22,458—23,729) of ECoV-NC99 encodes the 
predicted HE protein containing 423 amino acids. Nine potential 
N-glycosylation sites were predicted. SignalP analysis revealed 
a signal peptide probability of 0.802 with a potential cleavage 
site between residues 17 and 18. It was predicted that the 
N-terminal 390 amino acids are located outside the cell surface 
or viral envelope with a transmembrane helix at amino acids 
391—413 and an internal domain at amino acids 414—423. The 
putative active site for esterase activity, FGDS (Kienzle et al., 
1990), is present at amino acids 36—39 of the HE protein in 
ECoV. 


ORF4 (nt 23,744-27,835) of ECoV-NC99 encodes the 
predicted spike (S) protein containing 1363 amino acids. Eighteen 
potential N-glycosylation sites were predicted. An N-terminal 
signal peptide was identified with a potential cleavage site 
between amino acids 14 and 15 predicted by SignalP-NN or 
between amino acids 17 and 18 predicted by SignalP-HMM. 
The ECoV S protein was predicted to be a typical type I 
membrane protein with the N-terminal 1307 residues exposed 
on the outside of the cell surface or virus particle, a 
transmembrane domain near the C terminus (residues 1308-— 
1330), followed by a cytoplasmic tail (residues 1331-1363). 
Following multiple alignments with the S proteins of other 
group 2a coronaviruses, a potential cleavage recognition 
sequence (RRQRR) was identified at residues 764—768 which 
would predict a cleavage between amino acids 768 and 769, 
separating the ECoV S protein into S1 and S2 subunits (Fig. 1). 
The ECoV S1 subunit is expected to contain a receptor-binding 
domain whose position has not yet been determined. The S2 
subunit is predicted to mediate membrane fusion. Two heptad 
repeat (HR) regions, which are conserved in position and 
sequence among the three groups of coronaviruses and play 
important roles in membrane fusion (see reviews of Eckert and 
Kim, 2001; Hernandez et al., 1996), were identified in the ECoV 
S2 subunit (HR1: aa 991-1092; HR2: aa 1259-1304) (Fig. 1). 
The ECoV S2 subunit is anticipated to possess a fusion peptide 
whose position is yet unknown. Some coronavirus S proteins 
have been shown to contain important neutralization epitopes 
(Godet et al., 1994; Kubo et al., 1994; Yoo et al., 1991) and 
mutations in the S protein have been associated with altered viral 
antigenicity and pathogenicity (Ballesteros et al., 1997; Bernard 


96 J. Zhang et al. / Virology 369 (2007) 92-104 


mRNA (kb) 


+ 1(31) 


388) 


+ 4(73) 


tt 


tf 


§ (32 
6 36} 
+ 7(28) 


+ 8(24) 


+ 9(1.7) 


Fig. 2. Northern blot analysis of intracellular RNA isolated from ECoV-infected 
HRT-18G cells. A DIG-labeled probe which was complementary to the 3’ end 
(nt 30,660—30,946) of ECoV genome was used to detect the genomic and 
subgenomic mRNAs in ECoV-infected (lane 2) and mock-infected (lane 1) 
HRT-18G cells at 72 h p.i. 


and Laude, 1995; Dalziel et al., 1986; Gallagher and Buchmeier, 
2001; Leparc-Goffart et al., 1997). Whether the S protein of 
ECoV has such properties remains to be determined. 

ORFS (nt 27,825—27,947) of ECoV-NC99 is predicted to 
encode a hypothetical protein of 40 amino acids with an 
estimated molecular weight of 4.7 kDa (termed p4.7 protein). It 
was predicted to be a non-secretory protein and did not contain 
any transmembrane helix. This protein is not closely matched to 
any known protein based on a search using BLASTP, PSI- 
BLAST, or FASTA programs. 


ORF6 (nt 28,076—28,405) of ECoV-NC99 is predicted to 
encode a protein of 109 amino acids corresponding to the BCoV 
12.7 kDa non-structural protein (p12.7). This ORF overlaps by 
15 nucleotides with the ORF7 that encodes the E protein. No 
signal peptide or any transmembrane helix was present. No 
N-glycosylation site was found. 

ORF7 (nt 28,392—28,646) of ECoV-NC99 encodes the pre- 
dicted E protein containing 84 amino acids. No N-glycosylation 
site was identified. It was predicted to contain a signal anchor 
(probability 0.999). One transmembrane domain was predicted 
at residues 18—36 by TMpred analysis or at residues 15—37 by 
TMHM\M analysis. Both programs predicted the N-terminus of 
the protein to be external to the cell surface or viral envelope. In 
the case of other coronaviruses, there is increasing evidence 
that the E protein together with the M protein is instrumental in 
viral assembly and budding; the cytoplasmic tails of both 
proteins have an important interactive role in this process 
(Corse and Machamer, 2000, 2002, 2003; Vennema et al., 
1996). 

ORF8 (nt 28,661—29,353) of ECoV-NC99 encodes the 
predicted M protein containing 230 amino acids. It was 
predicted to contain a signal anchor (probability 0.947). Three 
transmembrane domains were predicted to be present at 
positions 25—46, 57—78, and 81-102 by TMpred analysis or 
at positions 25—44, 49-71, and 81-103 by TMHMM analysis. 
The N-terminal 24 amino acid residues were predicted to be 
outside and the C-terminal 127 or 128-amino acid hydrophilic 
domain was predicted to be inside the virus. One potential N- 
glycosylation site was predicted at position 26 (NFS). The 
presence of potential O-glycosylation sites was predicted at the 
extreme N-terminus of the M protein (MSSTPTPAPGYT). 
Whether these sites are glycosylated or not needs to be ex- 
perimentally verified. Previous studies have shown that the M 
protein of group | and 3 coronaviruses (e.g. TGEV and IBV) are 
N-glycosylated, whereas the M protein of group 2 coronavirus 
MHV is only O-glycosylated (de Haan et al., 2002; Lai et al., 
2006). The M protein is the most abundant envelope component 
and plays a key role in coronavirus assembly by interacting with 
the E, S, N and HE proteins (Bosch et al., 2005; de Haan and 
Rottier, 2005, and references therein). 


Table 3 

Oligonucleotide primers used for RT-PCR amplification of the leader—body junction of sg mRNAs 

Primer ID Position Sequence (5’—3’) Use 

22813N 22,792—22,813 GCGTTATCACCAGAAGCGGTGC Reverse transcription for mRNA2 (NS2) and reverse 
primer for mRNA3 (HE) PCR 

25095N 25,076—25,095 CGCCTATTCCAGGCAGAAGG Reverse transcription for mRNA3 (HE) and mRNA4 (S) 

29101N 29,078-29,101 GGCAGTAAGAGTATGATGGTCCTC Reverse transcription for mRNAS (p4.7), 
mRNA6 (p12.7) and mRNA7 (E) 

30945N 30,921—30,945 CTGGGTGGTAACTTAACATGCTGGC Reverse transcription for mRNA8 (M) and mRNA9 (N) 

IP 1-21 GATTGTGAGCGAATTGCGTGC Forward primer for all sg mRNA PCR 

21982N 21,958-21,982 GACGGGACTGACCAACTACACAACC Reverse primer for mRNA2 (NS2) PCR 

24283N 24,262-24,283 GCGTGGTGACCCAATACCACTG Reverse primer for mRNA4 (S) PCR 

28100N 28,078—28,100 TCCTCTCAGGTCTCCAGATGTCC Reverse primer for mRNAS5 (p4.7) PCR 

28334N 28,312—28,334 CAGCCTCCTCTATAGTATTGGCG Reverse primer for mRNA6 (p12.7) PCR 

28641N 28,617-28,641 CGTCATCCACATTAAGGACTGGTGG Reverse primer for mRNA7 (E) PCR 

29016N 28,992-29,016 GGGTTGAAACTCCACCAACTACCAG Reverse primer for mRNA8 (M) PCR 

29710N 29,691—29,710 GCGTTGATTGCCATCGGCTG Reverse primer for mRNA9 (N) PCR 
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genome S' 43693 UGGAUAAUGGUACUAGGCUUC AUGAAGCUUAGAUC AUAA CUAAA UG 23746 3 
50 23730 
aa Sa CRAG AEA AAT PUCATAAAAAC w 5 
mRNAS (p4.7) Ss! GUGCAUCCCGCUUCACUGAUCUCUUGUUAGAUCUCUUUUAANCUAAACCUCACAUG 37’ 
genome S' 979% GUUGUUGUGAUGAUUAUACUGGACAUCAAGAGCUAGUU! UARAAC be Acaue 27927 3' 
30 
og SY a CA ce AEC ANNAN AGT REC © 
wRNA6 (p12.7) Ss! GUGCAUCCCGCUUCACUGAUCUCUUGUUAGAUCUCUUUUAAUCHAUACUUUAUAACUUUA (3 6N) AUG 3’ 
genome S' 479 GUUAAAC salaiaaacialainiciaiimacmiaaca UAUAC GUUAUA CbbuA (3 6) AUG 28078 3' 
30 
sae SY naan aman auccummraafemaypovatansanc ws 
mRNA? (E) Ss! GUGCAUCCCGCUUCACUGAUCUCUUGUUAGAUCUCUUUUAANCCAAACRUUAUGAUAAA(112N) AUG 3’ 
genome S' 99% CAAGCUAGCUUUUGUGCUACAUUCACCCUUUACGGCAA KCC AAAG LUUAUGAUAAA Li2w}aue 29304 3' 
30 
ate Sn aang armani ancacummTa EDA pOUATAMAAC w S 
wRNAS (M) Ss! GUGCAUCCCGCUUCACUGAUCUCUUGUUAGAUCUCUUUUAAUCCAAACRUUAUG 3° 
genome S' a¢619 UAAAACCACCAGUCCUUAAUGUGGAUGACGUUUAGUA UecAAnG LUUAUG 28663 3' 
50 
— Scan ancccmarmeancnc garencammma aca frmataaaans w 5 
mRNA (N) Ss! GUGCAUCCCGCUUCACUGAUCUCUUGUUAGAUCUCUUUUAAPCUAAACUUUAAGGAUG 3’ 
genome S' 49319 AAGGUUCAGGCAUGGACACCGCAUUGUUGAGAAAUCAA SCUAAAC SUUAAGGAUG 29365 3' 


29240 


Fig. 3. ECoV sg mRNA leader—body junction and flanking sequences. The sg mRNA sequences are shown in alignment with the leader and the genome sequences. 
The genomic positions of the nucleotides in the leader and genome sequences are indicated. The start codon AUG in each sg mRNA is depicted in bold. Boxed regions 
are the putative TRS used for each sg mRNA synthesis. The 36N and 112N in the parenthesis mean that 36 and 112 nucleotides at that region are not shown. 
Homologous nucleotides between the leader and the mRNA or between the mRNA and the genome are indicated with connecting lines. 


ORF9a (nt 29,363—30,703) of ECoV-NC99 encodes the 
predicted N protein containing 446 amino acids. It was predicted 
to contain 36 potential phosphorylation sites. No signal peptide 
or any transmembrane helix was present. The N protein of 
coronaviruses has been shown to be multifunctional, e.g. 
interaction with the viral RNA genome to form a viral nucleo- 
capsid, interaction with the M protein, and the ability for self- 
association (Masters, 1992; Narayanan et al., 2000, 2003). 
Recently it has also been reported that the N protein may play a 
role in coronavirus replication (Almazan et al., 2004; Schelle 
et al., 2005). 

ORF9b (nt 29,424—30,044) of ECoV-NC99 encodes a 
hypothetical protein (I) containing 206 amino acids within 
ORF9a which encodes the N protein. It was predicted to contain 


10 potential phosphorylation sites. No signal peptide or any 
transmembrane helix was present. In the case of MHV, 
expression of the protein I has been detected in virus-infected 
cells but this protein is nonessential for viral replication and viral 
production (Fischer et al., 1997). 


Northern blot analysis of ECoV genomic and subgenomic 
mRNAs 


It is generally accepted that the replicase proteins are directly 
synthesized from the coronavirus genome, whereas the 
structural and accessory proteins are expressed from a nested 
set of subgenomic mRNAs. However, the number of sg 
mRNAs and the characteristics and expression pattern of the 
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proteins they encode (e.g. a sg mRNA may sometimes express 
multiple proteins) varies for each virus. In order to investigate 
ECoV sg mRNA synthesis, Northern blot analysis was 
performed to evaluate the synthesis of genomic and sub- 
genomic RNAs in ECoV-infected cells. A digoxigenin-labeled 
RNA probe complementary to the 3’ end (nt 30,660—30,946) of 
the ECoV genome was used for a Northern blot hybridization 
analysis. As shown in Fig. 2, nine mRNAs were detected in 
ECoV-infected HRT-18G cells at 72 h p.i. Absence of such 
mRNAs in mock-infected cells confirms that these mRNAs are 
ECoV-specific. According to the estimated sizes of the mRNAs, 
it is reasonable to assume that sg mRNAs 2-8 express the NS2, 
HE, S, p4.7, p12.7, E, and M proteins, respectively and that 
mRNA 9 expresses the N protein and probably the I protein as 
well. 


Determination of leader—body junction sequences of sg mRNAs 


There is a general agreement that the TRS elements 
determine the fusion sites of the 5’ leader and the 3’ body 
segments in coronavirus sg mRNAs. In order to determine the 
precise location of the leader and body TRSs used for ECoV sg 
mRNA synthesis, the leader—body junction and flanking 
sequences of each ECoV sg mRNA were determined using 
sg mRNA-specific RT—PCRs (see Table 3 and Materials and 
methods for details). The sg mRNA sequences were aligned to 
the leader and corresponding ‘body’ genomes as shown in Fig. 
3. Analysis of the leader—body junction sequences revealed 
that the core sequence of the TRS motifs is 5; UCUAAAC3’. 
The leader TRS (5‘UCUAAAC3’) and the body TRS (5’ 
UCUAAAC3’) used for synthesizing HE mRNA, S mRNA, 
and N mRNA exactly match each other. There is one mismatch 
between the leader TRS and the body TRS (5’UCUAAAA3’) 
used for generating the mRNA of the NS2 protein. There is 
also one mismatch between the leader TRS and the body TRS 
(5‘UCCAAAC3’) used for generating E mRNA and M 
mRNA. There are two mismatches between the leader TRS 
and the body TRS (5’/UUAAAAC3’) used for generating the 
mRNA of the p4.7 protein. Interestingly, in the case of the 
mRNA of the p12.7 protein, the leader and the body segment 
is joined at the unusual consensus variant 5’/UAAA- 
CUUUAUAA3’. Previously it has been shown that the 
mRNA of the p12.7 protein of BCoV also utilizes an unusual 
consensus variant for joining the leader and body segment 
(Hofmann et al., 1993). From the sequence data, we conclude 
that the ECoV common leader on sg mRNAs is the first 64 
nucleotides of the ECoV genome. 


Phylogenetic analysis of ECoV 


Phylogenetic analyses of ECoV and other coronaviruses were 
performed based on the amino acid sequences of replicase 
polyprotein ppla, the ORF1b-encoded part of the pplab, S, E, 
M, and N. Phylogenetic analysis clustered coronaviruses into 
three major groups (G1, G2a, and G3) irrespective of the gene 
used for analysis (Fig. 4). The SARS-CoV forms a separate 
branch and is classified as subgroup 2b (G2b) as suggested 
previously (Gorbalenya et al., 2004; Snijder et al., 2003). 
Phylogenetic analysis clearly demonstrated that ECoV falls into 
the cluster of group 2a coronaviruses and is most closely related 
to BCoV, HCoV-OC43, and PHEV. 

To further explore the possible evolutionary relationships 
among ECoV, BCoV, HCoV-OC43, and PHEV, the genetic 
distances of ECoV, BCoV, and PHEV to HCoV-OC43 were 
determined over the entire genome using the SimPlot analysis 
(Lole et al., 1999). As shown in Fig. 5, the BCoV strains and 
HCoV-OC43 had lowest genetic distances over the complete 
genome; the genetic distance between PHEV and HCoV-OC43 
was similar to the distance between BCoV and HCoV-OC43 in 
most regions of the genome with exception of the spike gene 
where the genetic distance of PHEV to HCoV-OC43 was 
significantly greater than the distance of BCoV to HCoV- 
OC43; the genetic distance of ECoV to HCoV-OC43 was 
significantly greater than the distance of either BCoV or PHEV 
to HCoV-OC43 in the regions of the first half of ORFla, the 
central part of ORFlb, NS2 and HE genes; the genetic 
distance with respect to the spike gene between ECoV and 
HCoV-OC43 was similar to the distance between PHEV and 
HCoV-OC43 but greatly higher than the distance between 
BCoV and HCoV-OC43. The genetic distances of BCoV and 
PHEV to HCoV-OC43 observed in this study are consistent 
with previously reported findings (Vijgen et al., 2005, 2006). 
Vijgen et al. (2006, 2005) concluded that PHEV diverged from 
the common ancestor before BCoV and HCoV-OC43. Our 
analysis suggested that ECoV had diverged earlier than PHEV 
from a common ancestor. In summary, ECoV had emerged 
earlier than PHEV, BCoV, and HCoV-OC43, notwithstanding 
the fact that ECoV was not isolated until 1999 from a diarrheic 
foal in USA. 


Conclusion 
In this study, we have determined the first complete genome 


sequence of ECoV and provided the first comprehensive analysis 
of the ECoV genome. Completion of the genome sequence of 


Fig. 4. Phylogenetic analysis of the amino acid sequences of replicase polyprotein pp 1a, the ORF 1b-encoded part of the pp1ab, spike (S), envelope (E), membrane (M), 
and nucleocapsid (N) of ECoV-NC99. Multiple amino acid sequence alignments were carried out by using ClustalX 1.83 and the unrooted neighbor-joining trees were 
constructed using PAUP 4.0b10. Bootstrap analysis was carried out on 1000 replicate data sets. CCoV, canine coronavirus (GenBank accession number D13096); 
TGEV, porcine transmissible gastroenteritis virus Purdue (NC_002306); FCoV, feline coronavirus (NC_007025); HCoV-NL63, human coronavirus NL63 
(NC_005831); HCoV-229E, human coronavirus 229E (NC_002645); PEDV, porcine epidemic diarrhea virus CV777 (NC_003436); BCoV, bovine coronavirus ENT 
(NC_003045); HCoV-OC43, human coronavirus OC43 strain VR759 (NC_005147); PHEV, porcine hemagglutinating encephalomyelitis virus VW572 (DQ011855); 
MHYV, murine hepatitis viruses AS59 (NC_001846) and JHM (NC_006852); SDAV, rat sialodacryoadenitis coronavirus (AF207551); HCoV-HKU1, coronavirus HKU1 
(NC_006577); SARS-CoV, SARS coronavirus Tor2 (NC_004718); IBV, avian infectious bronchitis virus Beaudette (NC_001451). 
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Fig. 5. Genetic distance between ECoV, BCoV, PHEV and HCoV-OC43. The average genetic distances were calculated over the entire genome using the SimPlot 
program with a sliding window size of 400 bp and a step size of 200 bp. Each curve represents a comparison of the sequence data of ECoV-NC99, the BCoV strains, 
and PHEV-VW572 to the reference sequence data of the HCoV-OC43 ATCC strain VR759 (NC_005147). The sequence data of the BCoV strains used for comparison 
are the 50% consensus sequence of six BCoV strains: BCoV-ENT (NC_003045), BCoV-Alpaca (DQ915164), BCoV-DB2 (DQ811784), BCoV-Mebus (U00735), 
BCoV-Quebec (AF220295), and BCoV-LUN (AF391542). The linear representation of the ECoV-NC99 genome was shown at the top of the diagram. 


ECoV will contribute to our understanding of this virus at the 
molecular level and also enrich the database of coronaviruses. 
The sequence data are expected to aid in the development of 
diagnostic and prophylactic reagents. The sequence data of 
ECoV-NC99 will also help identify and characterize other ECoV 
isolates and enhance our understanding of the molecular 
epidemiology of coronavirus. Neonatal enterocolitis is an 
economically significant disease for horse breeders. Further 
studies are needed to determine the prevalence of ECoV in- 
fection in equine populations and the relative role of ECoV as a 
cause of enteric disease in horses. 


Materials and methods 
Cells and virus 


The human rectal tumor cell line HRT-18G (American 
Type Culture Collection [ATCC, CRL-11663]) was grown in 
Dulbecco’s modified Eagle’s medium (DMEM) supplemented 
with 4 mM L-glutamine, 5% fetal bovine serum, and 
penicillin/streptomycin at 37 °C in the presence of 5% COd. 
The equine coronavirus-NC99 (Guy et al., 2000) was 
propagated once in HRT-18G cells to produce the working 
virus stocks. 


Isolation of viral RNA, RT-PCR amplification and sequencing 


The complete genome of ECoV was determined by sequencing 
11 overlapping RT-PCR products encompassing the entire 
genome (nt 1—3615; nt 3446-5458; nt 4953-6600; nt 5497— 
9678; nt 9347—13,021; nt 12,451—-15,736; nt 15,425—19,307; nt 
19,039—22,812; nt 22,566—26,390; nt 26,065—29,662; and nt 
29,363—30,992). Viral RNA was isolated from ECoV stocks using 
the QIAamp viral RNA mini kit (Qiagen). Viral RNA was first 
reverse transcribed with AccuScript reverse transcriptase (Strata- 
gene) following the manufacturer’s instructions. Then, PCR 
amplification was performed with proof-reading PfuUltra high- 
fidelity DNA polymerase (Stratagene) in a volume of 50 pl: 5 pl 
PfuUltra PCR buffer (10x), 1.0 1 dNTP mix (10 mM each), | pl of 
each primer (20 1M), 2 pl cDNA template, 1 ul PfuUltra DNA 
polymerase, and 39.0 «11 nuclease-free water. The reaction mixtures 
were incubated at 95 °C for 2 min, followed by 35 cycles of 
amplification at 95 °C for 45 s, 50-53 °C for 45 s, and 72 °C for 
4.5 min, with a final incubation at 72 °C for 10 min. The PCR 
products were gel-purified using QIAquick gel extraction kit 
(Qiagen). Both sense and anti-sense strands were sequenced using 
the Applied Biosystems Big Dye Terminator V3.0 sequencing 
chemistry on ABI 3730 DNA sequencers (Davis Sequencing 
Center). Partial genomic sequence (9487 nucleotides) of ECoV had 
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been previously determined by two groups (Guy et al., 2000, 
GenBank accession number AF251144; Wu et al., 2003, 
AF523846 and AF523850. H.Y. Wu, J.S. Guy, and D.A. Brian, 
unpublished data, AY316300). These regions were re-sequenced in 
this study. To determine the remaining genomic sequence of ECoV- 
NC99, initial RT-PCR and sequencing primers were designed 
based on multiple alignments of the genomes of BCoV (GenBank 
accession number NC_003045), HCoV-OC43 (NC_005147), 
PHEV (DQ011855), and MHV (NC_001846); additional primers 
were designed based on the results of the first and subsequent 
rounds of sequencing. All of the primer sequences are attached in 
the Supplementary Table S1. 


DNA and protein sequence analysis 


The nucleotide sequences were assembled and manually 
edited using CodonCode Aligner version 1.5.2 to produce the 
complete sequence of the viral genome. ORF analysis was 
performed using Vector NTI Advance 10 (Invitrogen). RNA 
secondary structures of 5’ and 3’ UTRs and the ribosomal 
frameshift signals were predicted using the MFOLD program 
with the default parameter settings (Mathews et al., 1999; Zuker, 
2003). Potential 3C-like protease cleavage sites were predicted 
using the NetCorona 1.0 server (Kiemer et al., 2004). Prediction 
of signal peptides and their cleavage sites was conducted using 
SignalP 3.0 server (Nielsen et al., 1997). Potential N-glycosyla- 
tion sites, O-glycosylation sites, and phosphorylation sites were 
predicted using NetNGlyc, NetOGlyc, and NetPhos, respec- 
tively (Blom et al., 1999; Julenius et al., 2005). Prediction of 
transmembrane domains was performed using TMpred (Hof- 
mann and Stoffel, 1993) and TMHMM server 2.0 (Sonnhammer 
et al., 1998). Protein similarity searches were performed using 
BLASTP version 2.2.16, PSI-BLAST against the Protein Data 
Bank (PDB) (Altschul et al., 1997; Schaffer et al., 2001) and 
FASTA version 34.26 against the uniprot protein database with 
the default parameter settings (Pearson and Lipman, 1988). 
Pairwise amino acid comparison was performed using EMBOSS 
Pairwise Alignment Algorithms with the default parameter 
settings (http://www.ebi.ac.uk/emboss/align). Multiple se- 
quence alignments were performed using ClustalX version 
1.83 (Thompson et al., 1997). Phylogenetic analysis and 
unrooted neighbor-joining trees were carried out using PAUP 
version 4.0b10 with the default parameter settings. Bootstrap 
analysis was carried out on 1000 replicate data sets. The genetic 
distance between genomes was determined using the SimPlot 
version 3.5.1 (Lole et al., 1999). 


Analysis of viral RNA by Northern blotting 


One anti-sense RNA probe base pairing to the 3’ end of the 
ECoV genome (nt 30,660—30,946) was developed to evaluate 
the synthesis of genomic and subgenomic RNAs in ECoV- 
infected cells by Northern blotting. The ECoV RNA was 
amplified using two primer pairs (forward primer 30660P: 5’ 
AGCAGATGGATGATCCCCTC3’; reverse primer 30946N: 5’ 
ACTGGGTGGTAACTTAACATGCTG3’) and the QIAgen 
One-step RT-PCR kit (Qiagen). The gel-purified RT-PCR 


products were cloned into a linearized plasmid vector with 
overhanging 3’ T residues (pDrive Cloning Vector, Qiagen). The 
authenticity and orientation of the insert was determined by 
sequencing both strands of DNA with M13 reverse and forward 
primers. Plasmid DNA was linearized with BamHI (Roche), 
phenol/chloroform extracted, ethanol precipitated, and resus- 
pended in nuclease-free water. A digoxigenin (DIG)-labeled 
RNA probe was prepared using the DIG RNA labeling kit 
(Roche) according to the manufacturer’s instructions. 

Intracellular RNA was extracted at 72 h p.i. from ECoV- 
infected HRT-18G cells using the RNAqueous-4PCR kit 
(Ambion). Northern hybridization with the DIG-labeled RNA 
probe was carried out following the protocols that had been 
previously described for equine arteritis virus (Balasuriya et al., 
2004). 


Determination of the leader—body junction sequence 


The leader—body junction sites of all ECoV sg mRNAs were 
RT-PCR amplified and sequenced. Briefly, intracellular RNA was 
extracted from ECoV-infected HRT-18G cells using the RNAqu- 
eous-4PCR kit (Ambion). Reverse transcription was carried out 
with an RT primer located downstream to the body TRS region in a 
sg mRNA (Table 3) using SuperscriptIII reverse transcriptase 
(Invitrogen) following the manufacturer’s instructions. Due to the 
nested nature of sg mRNAs, such an RT primer also binds to the 
corresponding positions in all larger viral mRNAs, including the 
genomic RNA. Subsequently, cDNA was PCR amplified with a 
forward primer (1P) located in the leader sequence and a reverse 
primer located just upstream of the RT primer in the body of the 
mRNA (Table 3). Amplification was performed in a volume of 
50 pl: 5 pl PfuTurbo PCR buffer (10%), 0.4 1 dNTP mix (25 mM 
each), 1 pl of each primer (20 pM), 2 pl cDNA template, | pl 
PfuTurbo® DNA polymerase, and 39.6 11 nuclease-free water. The 
reaction mixtures were incubated at 95 °C for 2 min, followed by 
35 cycles at 95 °C for 45 s, 50—56 °C for 45 s, and 72 °C for 3 min, 
with a final incubation at 72 °C for 10 min. RT—PCR products 
corresponding to each mRNA species could be distinguished by 
size differences on agarose gel. PCR products were gel-purified and 
sequenced to obtain the leader—body junction sequences for each sg 
mRNA. 


Nucleotide sequence accession number 


The nucleotide sequence of ECoV was deposited in GenBank 
under the accession number EF446615. 
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